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Abstract 

We examine genetic statistics used in the study of structured populations. In a 1999 
paper, Wakeley observed that the coalescent process associated with the finite island 
model can be decomposed into a scattering phase and a collecting phase. In this paper, 
we introduce a class of population structure models, which we refer to as G/KC mod- 
els, that obey such a decomposition. In a large population, large sample limit we derive 
the distribution of the statistic Fg, for all G/KC models under the assumptions of strong 
or weak mutation. We show that in the large population, large sample limit the island 
and two dimensional stepping stone models are members of the G/KC class of models, 
thereby deriving the distributions of F^t for these two well known models as a special case 
of a general formula. We show that our analysis of F,, can be extended to an entire class 
of genetic statistics, and we use our approach to examine homozygosity measures. Our 
analysis uses coalescent based methods. 

1 Introduction 

Biological populations are often divided into subpopulations between which migration is 
restricted. Such populations, referred to as structured populations, have been an important 
area of population genetics research since the 1930s |31]. In application, various statistics 
based on genetic data are used in hypothesis testing to understand structured populations. An 
example of such a genetic statistic is Fsf Fst, which we define precisely below, is used to test 
for the presence of population structure and to estimate migration rates [27; 28; 32]. 

The analysis of F^t has a long history that reflects the history of population genetics. F^t 



was introduced by Wright in the context of single locus, biallelic data Il32ll . Over time, F^t 



was generalized to multiple loci, multiple allele data (e.g. lllTt 12711 ) and to sequence data 
(e.g. 1IT3I1 ). Initially, Wright considered F^t under the infinite island model for population 
structure. Over time, F^t was analyzed under the finite island model (e.g. fl9';'23']), stepping 
stone models (e.g. |3]), and some more general population structure models (e.g. |30]). The 
method of analysis of F^t moved from frequency based methods to coalescent methods (e.g. 
ISEII22']). 

But today, the distribution of Fst is stifl poorly understood. The distribution of Fst is 



known only for the island model in the case of single locus, multiallelic data 111 lH . How the 
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distribution of F^i changes under different models of population structure and genetic data is 
not known. Fst, in all its forms, is just one example of a general problem. We know very 
little about the distribution of genetic statistics under population structure, and what we know 
about these statistics is confined to very specific models. In application, this lack of knowl- 
edge has important consequences. First, since distributions are not known, the construction 
of confidence intervals can only be done through resampling techniques l'26'l. Second, since 
results are not generalizable beyond specific models, hypothesis tests assume a null hypothe- 
sis that includes a specific form of population structure. By including such assumptions the 
utility of hypothesis testing is severely limited f?^. 

In this paper we address some of these issues by analyzing Fst and other genetic statis- 
tics over a class of population structure models which we call G/KC models. G/KC models 
are limiting versions of models that obey the scattering-collecting phase decomposition intro- 
duced by Wakeley [25]. We consider a large population, large sample limit, thereby removing 
statistical variance and focusing on evolutionary variance (see Oal for a discussion of this is- 
sue). In this setting, we derive a formula for the distribution of Fj, for any G/KC model under 
the assumption of weak or strong mutation. We show that in the large population, large sam- 
ple limit, the island and two-dimensional stepping stone models correspond to certain G/KC 
models, thereby deriving the distribution of F^i for both the island and stepping stone models 
as a special case of the more general formula for G/KC models. We further show that our 
approach to the analysis of Fst can be applied to a whole class of genetic statistics which 
we refer to as diversity measures and of which F^i is an example. In proving our results we 
assume a haploid population of constant size under a Moran mating scheme. 

Our analysis uses coalescent based methods, see |Ql for a good introduction. With this in 
mind, we describe the island, stepping stone, and G/KC models by specifying their coalescent 
processes. We consider the island and two dimensional stepping stone models because of 
their central role in population genetics. Other models can be analyzed by our methods, see 
1 1511 for a whole class of such models. 

The rest of this paper is organized as follows. In section|2]we introduce basic definitions 
that we need to present our results. In section [5] we present our results. In section |4] we 
apply our results in several different settings of practical interest. We discuss Fst under a 
single locus, infinite allele model, under a mutilocus, biallelic model, and under an infinite 
sites model. We also use our results to compare homozygosity measures under the island and 
stepping stone models. Sections |5]|7] contain the proofs of the theorems stated in section [3] 
Section |5] connects the G/KC coalescent to the island and stepping stone model coalescents, 
while sections |6]and|2]prove results concerning Fst- 



2 Diversity Measures and Coalescent Models 

In this section we introduce some basic definitions. In subsection 12.11 we give a general 
definition for diversity measures and the diversity measure F^i in particular. In subsection 
12.21 we introduce the island and stepping stone model coalescent processes along with the 
Kingman coalescent. Finally in subsection l2.3l we introduce the G/KC coalescent. 

2.1 Diversity Measures 

We consider a population that is separated into D subpopulations. We refer to these subpop- 
ulations as demes. Each deme is composed of individuals and the population size of each 
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deme is fixed at over all times. At time 0, we sample individuals from d demes. From each 
sampled deme we sample n individuals. So we sample nd individuals in all. 

From each sampled individual we obtain a genetic state. Let 5^ be the set of all mappings 
from N [0, 1]. A genetic state ^ is an element of S'' . Set 

^1''" — genetic state of jth sampled individual in k\h sampled deme. (2.1) 

Then x|™ G and £ {0, 1}. We say that G is a diversity measure if it is a bounded 

function of x^™ over k ~ l,2,...,d and j — !,...,« that is symmetric in /' for fixed k and 
symmetric in k for fixed j. 

Let xO be the indicator function (i.e. (true) = 1, (false) = 0). We introduce two spe- 
cific diversity measures on which our technical analysis focuses: the homozygosity measures 
00,01 and Fst- We use the definition and notation given by Nei in IitIi . 

Homozygosity Measures: 



For 01 7^ 1 



1 

00 = :7 L "^o,*- (2.3) 

^1 = ^2 E ;^ E ^(C-C'). (2.4) 

^ M'=i" jJ'=i 



00-01 

Fa^- — . (2.5) 

1-01 



2.2 Coalescent Models 

We model the evolution of a structured population by specifying a coalescent process. Coa- 
lescent processes are Markov jump processes. We start by defining the state space for these 
coalescent processes. We use the notation found in fl^]. 

Let — {gi,g2, ■ ■ ■ jgo}- 5^ represents the demes composing the population. Let ^ = 
lJk=ilJ'j=i{xk,j}- is the set of all individuals sampled from the population. Note that 
Xkj is simply an element of ^ serving to represent the jth sampled individual from the A:th 
sampled deme as oppose to xf^" which represents genetic data. Let = Uy=i{.^jt,;}. ■^k 
is the set of individuals sampled from the kth sample deme. Let ^ be the set of partitions 
of A partition of ^ corresponds to a collection of disjoint sets £^,£2, ■ ■ ■ ,£,„ such that 
UJli Ei = We specify TT G ^ by ;r = {£1 ,£2, . . . and refer to the as the blocks 

of n. Let be the set of partitions of ^ in which each block is assigned a label from 
That is. 



a lab 



= {{(£i,gi), (£2,^2), . . . , {E,n,g,„)} ■■[jE, = .^,gi e n (2.6) 



3 



Intuitively, ^, is the deme occupied by block £, . For n G ^^^^ we let | tt | represent the number 
of blocks forming n. We define a coalescent process as a Markov process in which only two 
type of state jumps are possible. 

1. A labeled block (£,«) may change to (£,«'). This is a migration event. 

2. Two blocks (£i,a) and (E2,a) may combine to form a single block (Ei L)E2,a). This 
is a coalescent event. 

We let H{t) represent the state of a coalescent process at time t. So Il{t) e The 
different coalescent processes are specified through their different transition probabilities. 
We first consider three standard coalescent processes: the Kingman coalescent, island model 
coalescent, and stepping stone model coalescent. 

Kingman Coalescent: 

We denote the Kingman coalescent by FIkc {t).ln the Kingman coalescent we have D = 1 
and so we can ignore the labels of the blocks. The jump rates of IIkcC?) given by the 
following rule: 

Two blocks {Ei} and {Ej} coalesce into {EiUEj} at rate 1. 

Island Model Coalescent: 

We denote the island model coalescent by IIim (f ) • In this model we set = { 1 , 2 , . . . , D} . 
The jump rates of IIim {t) are given by the following rule: 

1. The labeled block {£, ,0,} migrates to {Ei,a'j} at rate 

2. Two labeled blocks {Ei,a} and {Ej,a} coalesce into {EiUEj,a} at rate ^. 

m is the migration rate. The island model is a completely symmetric model, a migrant is 

equally likely to migrate to any deme. 

Stepping Stone Coalescent: 

We denote the stepping stone model coalescent by IIss (?)• In this model we let $f be the 
lattice in Ip- specified by [0, 1,2, . . . ,W - I] x [0,2, . . . ,W - 1]. To make a connection to the 
island model case we set D = W^. We think of as a torus. The neighbor demes of deme 
(i, j) are {i+l, j) ,{i — l,j),{i,j +l),{i,j — I) where the arithmetic is modulo W. The jump 
rates of nss(?) are given by the following rules: 

1 . A block Ei migrates from its current deme to a neighboring deme at rate ^ . 

2. If two blocks, Ei and Ej, occupy the same deme then they coalesce at rate ^. 

In all the models we consider, genetic diversity is created by mutations. To model muta- 
tion, we assume that blocks experience mutations at rate fi. At f = 0, we set xf^"{i) = for 
all k,j,i. We let e{t) be the mutation counter. That is, e(0) = and every time a mutation 
occurs e{t) is incremented by 1. When a block, say E, mutates we set x|™(e(f)) = 1 for 
every x/^ j E E. Often, in the case of the Kingman coalescent we will make the mutation rate 
exphcitby writing IIkcI^M)- For th© island and stepping stone model coalescents we define 
e = uND. 

While diversity measures are defined as functions on the x|™, the value of each x^^" is 
determined by the underlying coalescent. For this reason we write Gijl{t)) to mean G under 
the coalescent process n(?). 
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2.3 The G/KC coalescent 



In 0251], Wakeley pointed out that the dynamics of Him (f) can be decomposed into two phases: 
a scattering phase and a collecting phase. The scattering phase describes the initial phase of 
niM(f ) in which blocks migrate away their start demes until every block occupies a separate 
deme. Then, in the collecting phase, blocks that occupy separate demes migrate to common 
demes and coalesce until a single block remains. As Wakeley pointed out, the collecting 
phase is well modeled by the Kingman coalescent. 

We distill three key components of the scattering-collecting decomposition that can be 
apphed in a more general setting than the island model. 

1 . During the scattering phase, no two blocks that contain individuals from separate sam- 
pled demes coalesce. 

2. The scattering phase occurs on a much faster time scale then the collecting phase. 

3. During the collecting phase, the coalescent is well described by the Kingman coales- 
cent 

We introduce a coalescent process that is a limiting version of these three requirements. 
We refer to this coalescent as the G/KC coalescent and denote it nG/Kc(f )■ Like HkcIO' '^he 
blocks of Hg/kc (f ) are not labeled. To define Hg/kc we specify a random partitioning of each 
More precisely, we assume that is partitioned into B]^ blocks, i ,£'(.,21 ■ ■ ■ iEk,Bii- We 

set bk^j = which implies bj^ i +bk^2 H +i>k^Bi, = 1- For k = 1,2, ...,d, the tuples 

{Bii,bii i, . . . ,biiBi;) are i.i.d. Since diversity measures are symmetric in the individuals form- 
ing each ^i-, we need only specify \Eii j \ . 

The random partitioning is then used to define the initial condition of the G/KC coales- 
cent. 

UG,Kc{0)^utiU%^{Ekj}. (2.7) 

The dynamics of the G/KC coalescent are given by a Kingman coalescent with mutation rate 
r. That is, for some r > 

nG/Kc(f) = nKc(f;r) (2.8) 

Hg/kc is simply the Kingman coalescent run at mutation rate r with a random initial partition- 
ing of the ^(-. The G/KC coalescent is specified by r and the distribution of {B]i,bk,iT ■ ■ ,^/t,%)- 
The G/KC coalescent is a limiting version of Wakeley's scattering-collecting decomposi- 
tion. The scattering phase, which occurs on a fast time scale for niM(r)' is instantaneous in 
Hg/kc (0 and is completely general in its distribution (hence the G in G/KC). The collecting 
phase, which occurs on a slower time scale, is described by the Kingman coalescent (hence 
the KC in G/KC). 

To each G/KC coalescent we associate scattering probabilities. Let I,ji,... , ji be posi- 
tive integers and set J — ji + ./'2 H h ji- Suppose we select J individuals from Then 

'^Uij2,---,ji) is the probability that the J individuals are partitioned into / sets of size 
;i, 72, •••,;'/ by the blocks £'4_i,£'i_2,£'i,Bi- We refer to S(;i,;'2, • • • ,;/) as a scattering proba- 
bility. 
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3 Results 



We consider diversity measures in the large population, large sample (LPLS) limit which we 
write as limLPLs and define as follows. In the LPLS limit we take N,D,n,d °o. The limit 
requires some further assumptions depending on the coalescent process we are considering. 
When we consider the island model, we set F ~ Nm and assume that F, 9 are held fixed 

while ^ ^ 0, M ^ 0. In the case of the stepping stone model we follow 12|] by setting 
a = We then fix a, while 0. We also require that sample demes are 

separated by a distance of at least Asampie = ^^gw ' ^'^ considering G/KC coalescents, we fix 

E\B~] 

r, assume the tuples (Bi;,bk,i, ■ ■ ■ ,bk.Bj converge in distribution, and take 0. Since 

{Bk,bic,i, ■ ■ ■ ,bk.Bt) converges, the limit of S exists and we set E ^ E. Whenever we refer to 
a limit, we are considering the LPLS limit unless we specify otherwise. 

Our first two results demonstrates that the analysis of diversity measures under the island 
or stepping stone model coalescents can be reduced to the analysis of diversity measures for 
G/KC coalescents. Define 

T;=^/n(l-A) (3.1) 

(=1 

and where the j3j are i.i.d as Beta[l , 2F] . Then we have the following result. 
Theorem 1 (Island Model Convergence). Let G be a diversity measure. Then, 

Hm G(nw(r)) = Hm G(nG/^c(0) (3-2) 

where r = 9 B/^ oo^ and for fixed J, 

{bk,i,bk,2,---,bkj) (Ti,T2,...,Ty) (3.3) 
Theorem 2 (Stepping Stone Model Convergence). 

lim G{Ilss{t)) = lim G{IlG/Kc{t)) (3.4) 

LPLS LPLS 

where r = 9^-^ and]imLPLs{Bk,bk,i, ■ ■ ■ ,bk.Bk) distributed as the blocks of Yl^^gl {log{^-^)). 

The next result characterizes the distribution of F^t under a G/KC coalescent. We split 
into two cases. First, we consider the case of r ^ oo, which we refer to as the strong mutation 
case. 

Theorem 3 (Strong Mutation Case). 

lim lim F^riYlc/Kcit)) = S(2). (3.5) 

r^^LPLS 

Taking r ^ corresponds to the assumption of weak mutation. In computing Fst under 
weak mutation we may assume that exactly one mutation occurs in the G/KC coalescent. 
We assume that the mutation occurs when IHg/kcCOI = ^- Define A = limLPLS § and K = 
liniLPLS E\B\]d - following theorem shows that when X —Q, the distribution of Fst in the 
weak mutation case is the same as that in the strong mutation case. 
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Theorem 4 (Weak Mutation Case). IfX^O then, 



UmF„(nG/^c(0) = S(2). (3.6) 

If A ^ 0, then the following results show that Fst has a non-degenerate distribution. 
Theorem 5 (Weak Mutation Case). Assume A > and K — Q. 

^^M^cMt))^%^ (3.7) 

where Xj^ are i.i.d. versions of the random variable X which is defined by the following 
moment relations 

E[x']^Z{k) (3.8) 
and Q is Poisson distributed with rate j, where V is exponentially distributed with mean 1. 

Theorem 6 (Weak Mutation Case). Assut^ie A > and < K < I. Let G{k) be a geometric 
random variable with success probability K. IflimipisE[Bii] < °° then 

yG{K)+l 2 

lim F,„(UG/Kc{t)) = ^'=' ' (3.9) 
LPLS ifli' + V, 

where are i.i.d. versions of the random variable W which is defined by the following 
moment relations 

,.1 _ m 



If\imLPLsE[Bk] = oo then 



E[W'] = ' (3.10) 

\\mLPLsE[Bk\ 



limF,, =0. (3.11) 

LPLS 



Theorems [T] and |2] are proved in section |5] Theorem [3] is proved in section |6l Theorems 
HE are proved in section |7] 

4 Applications 

We now apply the results stated in section [3] In section 14711 we examine the distribution of 
Fsi under a single locus, infinite allele model. In section l4!2l we examine Fsi under a multiple 
locus, biallelic model and under an infinite sites model. Finally in section 14.31 we consider 
homozygosity measures. 

4.1 Fst 

Fst as defined in ( 12. 5t corresponds to a single locus, infinite allele model. In such a setting, 
the distribution of F^t has been a subject of research for some time. The relation Fst = x+wm 
was originally proposed by Sewall Wright [32,1 . A quantity related to Fst, which we label F*j, 
is defined by 

_ £[00-01] 
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In II22 H30I1 the authors derive the value of F*f under the island model, while in 111 It 12011 . the 
authors derive the distribution of Fst for the island model in the strong and weak mutation 
cases. In the authors derive the value of F*, for the stepping stone model. We note 
that while F* is a quantity of theoretical interest, F^t is more relevant in application. Fst is 
a random variable, while F* is deterministic. In this paper we consider Fst- The previous 
results leave two fundamental questions unanswered. 

1 . How is the distribution of Fst affected by changes in the structured population model? 

2. What is the distribution of Fst for the stepping stone model? 

The first question is answered by Theorems [3]|6] for populations that converge to G/KC 
coalescents. In the strong mutation case, F„ will converge to a deterministic limit, while in 
the weak mutation case the distribution of Fst can be computed and will depend on where in 
the coalescent the mutation occurs. 

Now we turn to the second question and consider Fst for the stepping stone model. For 
completeness, we will also state the corresponding results for the island model. To compute 
LPLS limits of Fst we need to compute Z{k) for k > 2. For the island mode, Z{k) is the 
probability that k individuals in a given deme all coalesce before a migration occurs. Simple 
Kingman coalescent arguments, see fl, give 

sw=n-^. (4.2, 

Note that S(2) = Tj2r- Po'" *^ stepping stone model, Z{k) = f'(|nj2;(log(i±^)) = 1). By 
equation 5.2 in 1241 . 

S(fe)=/,(log(i±^)), (4.3) 



where 



a 

/,(0 = l + Iexp[-(^fcll>](-l)''-i(2/z-l)Q^J^ (4.4) 



rn 



Note that this gives E(2) = p:^. Using (I4.2l i and i4.3\ and Theorem[3]we can compute the 
LPLS limits of Fst- For strong mutation we have the following result. 

Proposition 1. Let 9 ^ °°. Then for the island model 



while for the stepping stone model 



F.^^ (4.5) 



Fst^j^. (4.6) 
1 + a 



The exact same result holds in the case of weak mutation when A = 0. For A > 0, we 
can numerically compute the distribution of Fst- For instance, consider the case A = 2. In 
this case K = limLPLS P^J" '^he island model so K — 0. For the stepping stone 

model, E[Bi;] is finite and can be numerically computed using known formulas [24] (we find 
K « .388). Using Theorem|5]for the island model and Theorem|6]for the stepping stone model 
we can numerically compute the distribution of Fst- The result is given in figure[T]in the case 
F = 1 for the island model and a ~2 for the stepping stone model. In this case the mean of 
Fst for the island and stepping stone models is approximately .2 and . 1 respectively. 
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0.2 0.4 0.6 0.8 1 

Figure 1: pdf of liniLPLs^s; for island model (dashed line) with F = 1 and stepping stone 
model (unbroken line) with a = 2. In both cases A = 2 
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4.2 Generalizations of Fst 

Today, genetic data rarely fits the single locus, infinite alleles assumption of the previous 
section. We examine two generalizations of Fst- In IZZD, Weir and Cockerham generalized 
Fst to biallelic, multiple loci data. To model such data we let J^f™(0 represents the allelic 
state of locus / for the given individual. In ifiol : [Ts ; 18 1 the authors consider F^t generalized 
to sequence data. In this setting, we let ;ic|™ represent a string of Os and Is. 

We start by considering biallelic, multiple loci data. We assume / loci and a single muta- 
tion on IIg/kc (0 for each locus. Define for i— 1 , . . . , Z 

^0.(0-^1^ t x(<;(o-xf™(o), (4.7) 
1 '' 

Ui) = ^ L (^o,k{i), (4.8) 



k=l 



-^-iW^i lit X«"(0=C'(0). (4.9) 

00 (0 7 01 (0 ^6 homozygosity measures for locus /, and we can use these measures to form an 
Fst value for each locus. 

0o(O-0i(O 

Fst,i = —, , ■ (4.10) 

A key question considered by Weir and Cockerham is how to combine the 0o(i), (b] (i) values 
in order to produce a statistic with small variance. In a widely cited paper, 12711 . Weir and 
Cockerham suggested using F^''^, where for L^o(l — (pi (/)) ^ 



i;,^o(i-0i(/')) 



Alternatively, one might form a statistic by simply averaging the Fj; ,. That is, 

Kr = \tFstj (4.12) 

Our analysis of Fst allows us to prove the following result. 
Proposition 2. Fix I. 

^:r(nG/^c(f))->S(2) (4.13) 

Fr{nG/Kc{t))^m (4.14) 

To see Proposition|2]first note that we may assume that nG/Kc(f ) has exactly / mutations. 
Let the levels of these mutations he Li,L2, . . . ,Li where each L, is i.i.d. Theorem [3] shows 
that if ^ ^ for all /, then each F,, , E(2) and the result will follow. Let = inf{f : 
|nG/Kc(OI — Since mutations are distributed as a Poisson process we have 

P{L,)= in^'Sr^'''^ (4.15) 
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Well known results for the Kingman coalescent, see for example section 1.3.1 of |4], give 
j^|nG/Kc(0)| .^j^ _ ^ (9(log(|nG/Kc(0)|)) while U{Ti - T,_i) = O(i-). Using these re- 
suits and noting d < |nG/Kc(0)| < nd gives for fixed 5 > 

P(§>5)<0(»). (4.16) 
d logfl 

This shows that for fixed /, we have ^ — > 0. In fact as long as I ^ logii the result holds. 

In II27II . Weir and Cockerham showed through numerical experiments that for finite sam- 
ples F^*^ has lower variance then F^^™. To explain this, we note that Fsr.i will have high 
variance if ^ is (9(1). We will also eventually show, see Lemmas |7 . 61 and |T9l that the means 
of 0o(Oj 01 (0 are 0( J-) while variances are 0{ jj). Now suppose that ^ = 1 while for i =/= I, 
Li — (9(1). In this case, with F^j™ in mind, we have the following facts. 

. y[F„,i]=o(i). 

. For / ^ 1, F,,,j « S(2) and y [F,,,-] = o(l). 
These two facts give y K™] = Oijr). For F^'^, the following facts are relevant. 

. £[</)o(l)]=0(i),£[</)i(l)]=0(i). 

• For ;V 1, VMi)] = 0{l) and y [0i(/)] - 0(1). 

. For / ^ 1, Mh^ « S(2) and y [Mh^iW ^ 
These three facts give y [^'j^*^] = 0{^). For d ^ I we see that F^'" has lower variance than 

E-ave 

Now we consider F^t for sequence data. Various formulas exist for such a generalization, 
see for a summary, but up to small variations all are given by the formula for F^ given 
in ( 14.1 lb with / — °o. This means, if we assume a fixed number of mutations, that our analysis 
from Proposition I2] holds and we have F**^(nG/Kc(0) ~* '^(2) for sequence data. 



4.3 Homozygosity Measures 

Homozygosity measures are commonly used to quantify genetic diversity. Previous work on 
homozygosity measures for subdivided populations has focused on computing means for ^o.A- 
and 01 i-, e.g. lfl4l [l6ll . In this section we derive the distribution of (pQj: under the infinite 
alleles model and the assumption of strong mutation. By the definition of the G/KC coa- 
lescent, at f = the n individuals from sampled deme k are split into B/^ blocks of relative 
sizes bii i,bk^2, ■ ■ ■ ,bk,Bii- If mutation is sufficiently strong, r ^ 1, each of these blocks will 
experience a mutation prior to a coalescent event. In such a case, each of the blocks will 
have a different allehc state. This allows us to compute the distribution of (^o,iS:- 

t>o.k^T.''lj- (4.17) 

j=i 

For the case of the island model, Theorem[T]gives 
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For the case of the stepping stone model, let V,- , be exponential random variables with mean 
1 that are independent over /, j for / = 1,2,... and j = 1 , . . . , /. Then, one can show (see 
that 



Theorem[T|then gives 



-V, 



(4.19) 



1 + a V,. 1 + V^' 

^0, - E^--a°g(^)) (v,^,+;..,+...:v,^,)2 . 



1=1 



where hi, by equation 5.2 in |24], is defined as 



h,{t) = 



ir=,exp 
i+i; 



,.2exp[-(M^>](^)(- 



1) 



if 1 
if / = 1. 



(4.20) 



(4.21) 



Under strong mutation, the LPLS limit distributions of 0o,<: for the island model and 
stepping stone model are given in figure |2l As in section 14.11 we take F = 1 for the island 
model case, and a = 2 in the stepping stone case. This gives, for both cases, ^[^o.a] = We 
note the similarity in the distribution of 0o under the two models. Currently, there are many 
statistical tests for population subdivision, but we are not aware of any statistical test that 
addresses the type of subdivision. The similarity in homozygosity measures for the island 
model and stepping stone model suggests that any such test should not involve homozygosity 
measures. 



5 Convergence to the G/KC coalescent 

In this section we prove Theorems [T] and |2] To do this, we define a time Tscat and show that 
the following conditions hold. 

1 . (Independence Condition) The probability that individuals from separate sampled demes 
coalesce before Tscat goes to zero. 

2. (Short Scattering Phase Condition) The probability of a mutation before Tscat goes to 
zero. 

3. (KC Condition) After time Tjcat, the coalescent converges to a Kingman coalescent 

After demonstrating these three condition, we determine the distribution of Bk,bk,i,bk2, . . . ^bk.Bi, 
formed by nG/Kc(7scat)- Lastly, we show that for both Theorem [T] and [2] the condition 

^ ^ holds. 

To demonstrate the KC condition, we introduce the following notation. For a general 
coalescent process n(f), let E\,E2, ■ ■ ■ ,Eii be the blocks forming Tl{Tk). Recall 7i = inf{f : 
|n(f)| = k}. Define Nj{k k — 1) as the number of mutations that block Ej experiences 
during time [T^jT^-i). LetUi{k), U2{k) be the indices of the two blocks that coalesce at time 
Tjc^ 1 . If we specify some unique way of ordering the blocks £, (say by ordering £, based 
on some lexographic ordering of the x^- j) then any diversity measure G will be a function of 
Ui{k),U2{k),Nj{k^k- 1), and n(0). ' 

We will use the following Lemma to prove the KC condition. 
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0.2 0.4 0.6 0.8 1 

Figure 2: pdf of (po /^ for island model (dashed line) with F = 1 and stepping stone model 
(unbroken line) with a = 2 
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Lemma 5.1. Let n(f) be a coalescent process with |n(0) | ~ M and let n/j-c(f , r) be a King- 
man coalescent with YlKci^) equal to 11(0) with the labels of the blocks removed. Let G be a 
diversity measure. If 

M k 

l^l^\E[Nj{k^k-l)]-—\^0 (5.1) 

k=2j=\ \2) 



and 



then 



M 

L 

k=2 



\-k{k~\) inf P{U,{k)^j,U2{k)^f) 

],]'=l,...,k-]^]' 



(5.2) 



lim G(n(f)) - lim G{IlKc{t.r)). (5.3) 

LPLS LPLS 

Proof. Since mutation events are Poisson processes, N j{k ^ k — 1 ) has a Poisson distribu- 
tion. h&\.Nj{k ^ A; — 1) be the A^^ associated with nKc(f,'")- Then^y(fc ^ ^ — 1) has Poisson 
distribution with mean -p-r . We couple mutation events on 11 (f ) and IIkc [rt ) for their respec- 

tive intervals [Tk,Tk_\) as follows. Match the blocks k blocks in n(f) with the k blocks in 
IlKc(f) in some arbitrary way. Apply mutations to each block according to a Poisson distri- 
bution with mean jj^. Now add more mutations to each block in Il{t) according to a Poisson 

(2) 

distribution with mean E[Nj{k^ k—\)] — (if the quantity is negative, remove mutations). 

(2) 

If we add (or remove) mutations in this second step we say that a decoupling event has taken 
place. By ( 15.11 ) the probability of a decoupling event over all k goes to zero. So we have a 
coupling between the mutations on n(f ) and nKc(f)- 

Now we establish a coupling for t/i,t/2- Let Ui,U2 be the Ui corresponding to nKc(f)- 
Then PiUiik) = j,U2{k) = f) - Set a = mfj^fP{Ui{k) = j,U2{k) = /). We now 

partition [0, 1] into k{k-^ 1) + 1 intervals. k{k— 1) of these intervals are of size a and each of 
these intervals corresponds to a specific j,f combination. We couple Ui and Ui as follows. 
We select a number uniformly on [0, 1]. If the number lands in one of the k{k — 1) intervals 
corresponding to some j,f pair then we coalesce the same blocks in Tl{t) as coalesce in 
IIkc ('"0 ■ Otherwise, if the number falls in the interval that does not correspond to a j, f pair, 
we say a decoupling has occurred and we coalesce each process separately. ( 15.2b shows that 
over all k the probability of a decoupling goes to zero. So we have a coupling between the 
blocks that coalesce in Tl{t) and those that coalesce in nKc('"0. 

The result now follows from the observation that G is bounded and depends only on the 
number of mutations in each block and the order in which the blocks coalesce. 

□ 

Before proceeding we set some notation. For any coalescent n(f ) (that is IIkc, Him, llss, 
IIg/kc) we let Ilk{t) for k ~ l,...,d represent Tl{t) with the blocks intersected against 
That is, if 

n(f) = {(£1, ai), (£2, 02), ■■ ■,(£■»!, am)}, (5.4) 

then 

n^(f){(£in^^,fli),(£2n^i,fl2),...,(£mn^i,fl„,)} (5.5) 

ForniM(f) andnss(f), unless specified otherwise, we take 11(0) = ULi [J"j=i{{xk.j,^{k))}, 
where &{k) is the deme label g^'iS corresponding to the A:th sampled deme. Since the deme 
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labels in HkcIO may be ignored, nK.c(0) is specified by |nKc(0)|. We write Yl^J^it) for 

nKc(f) with |nKc(o) = ^|. 

We use 0, and / to represent various probability events and integrals respectively. Within 
a given proof, © and / are consistently used, but their definition varies between proofs. We 
use C as an arbitrary constant that may change from line to line. 



5.1 Island Model and G/KC 

In this section we prove Theorem[T] Set Tscat = N\fD. 

Lemma 5.2 (Independence Condition). Let © be the event in which two blocks from separate 
sampled denies coalesce before time Tscat- Then, 

P(©) = 0(^^) (5.6) 

Proof. A block migrates to a deme that is occupied by another block at a rate bounded by 
So the probability of a block entering a deme occupied by another block before time Tscat is 
bounded by 

r""'" m m Tscatw 1 ,, „ 



Summing this probability over all possible pairs gives the result. 



Lemma 5.3 (Short Scattering Phase Condition). 



□ 



nd 

E[number of mutation before Tscat] — 0{—=). (5.8) 

\/D 



Proof. There are at most nd blocks in the time interval [0, Tscat] . Then, 

nd 

£[number of mutations before T^^A < ll{nd)Tscai = 0{—j=) ^ 0. (5.9) 

□ 

Before proving the KC condition, we show that each block of niM(rscat) occupies a sep- 
arate deme. We refer lo n € ^^^^ as a scattered state if each block occupies a separate deme. 
We refer to TT as a semi-scattered state if two blocks share the same deme while all other 
blocks are in separate demes. 

Lemma 5.4. 

P{TliM{Tscat) is a scattered state) —> 1. (5.10) 
Proof. We demonstrate that the following two facts hold in the LPLS limit. 

• every block experiences at least one migration 

• during [0, Tjcat], blocks migrate to demes that are unoccupied by other blocks. 
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To see the first fact recall that blocks migrate at rate m. So the probability of a block 
not migrating away from its sample deme by time Tscat is 0(exp[-v^]). To see the second 
fact we recall that blocks migrate to a deme occupied by another block at a rate ^. So the 
probability of migrating to an occupied deme is 0{^j^). Summing these probabiUties over 
all possible blocks shows that at time T^cat every block is in a separate deme with probability 
(9(-^^^). Taking the LPLS limit finishes the proof. 



Lemma 5.5 (KC condition). 

\^IM{Tscat)\ k 
I I 

k=2 j=\ 

,kwithj^j' 



E[Nj{k^k-l)]- 



e i+2r 



IF 



0. 



For iJ' = 1,2, 



L 

k=2 



\\-k{k-\)^ inf P{U,{k)=j,U2{k) 
j..r=i k-j^y 



□ 



(5.11) 



(5.12) 



Proof. Assume that Tl{Ti;) is in a scattered state. The process goes to a semi-scattered state at 
rate — ML_ilI Once the blocks are in a semi-scattered state, three events can occur. 

We specify the rates of these three events. 

• the two blocks can coalesce (rate: ^). 

• the blocks can return to a scattered state, (rate : 2m(l — ^) ~ ^{l — ^))- 



• the blocks can enter a state that is neither a scattered state nor a semi-scattered state, 
(rate: 0{^)). 

If the blocks return to a scattered state, the whole situation starts over. Let the event of 
entering a state that is not a scattered state nor a semi scattered state be 0. A simple ratio 
shows ^ 

P(0) = O(^) (5.13) 
Now consider E[Nj{k k— 1 )] . We have, 

E[Njik -^k~l)]= ju£[r,_i - n] (5.14) 

The blocks occupy a scattered state for time with mean ^k^k^i) ■ Once in a semi-scattered 
state, outside of the event 0, the blocks either coalesce or return to a scattered state in time of 
order 0{N), the probability of coalescing is j^^r '^^ij))- this all together and using 

(15.13b gives. 



, ND l+2r, ,k^,. 



(5.15) 



Plugging the above expression into ( 15.141 ). summing over k, and taking the LPLS limit 
gives ( 15.111 ). By the symmetry of the island model, if Tl{Tk) is in a scattered state then 
P{Ui{k)= j,U2{k) = j') = ^i^- By dmil and LemmaEll the probability of n(7i.) being 

in a scattered state over all k is bounded below by 1 — (9(^^L2'^ij)- ^^^^ gives ( |5.12t . 

□ 
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Lemmas I5.2H5.5I proves |3?2l in Theorem [T] We are left to specify the distribution of 
niM,j(:(7scat). As observed in Jst blh . B^^bi^ j are specified by theEwens SampUng Formula 
More precisely, the following theorem follows from a result of Hoppe 1^ and our Lemmas 
OandlgH 

Theorem 7 (Hoppe's Urn Theorem). Let ^, be a Bernoulli random variable with success 
probability 2i^i- \ ■ Assume that (^2,^3, • • • cire independent. Then, 

Bk^l+^2 + ^3 + --- + ^n. (5.16) 

and each B^. is i.i.d. 

Using a theorem of Donnelly and Tavare jstl we have the following result. 
Theorems. For fixed J, 

lim{bk,,M.2, ■ ■ -Mj) = (Ti , Y2, . . . Jj) (5.17) 

where T is defined as in Theorem\l\ 

Finally we note that using Lemma|7] a simple computation shows = (9(^2SJ£l). By 
our assumptions on the LPLS limit of a stepping stone model coalescent we have ^ ^ 0. 

5.2 Stepping Stone Model and G/KC 

This section is dedicated to the proof of Theorem|2] In 11112; EH, the authors made significant 
breakthroughs in the analysis of the stepping stone model coalescent. In this section, we 
draw heavily from the theory developed in those articles, especially from the work of Zahle 
et al. llssll . Our results use the basic techniques introduced by these authors, although there 
are several important differences. Zahle et al. assume that sampled individuals are initially 
spaced far apart, while we start with n individuals in each deme. Further, Zahle et al. assume 
fixed n,d as N,D °o while we take n,d,N,D °o. Perhaps more importantly, while Zahle 
et al. use an integral approach toprove their results, we use a differential approach. 

We feel that the results in UlSEsi] have not received the attention they deserve within the 
population genetics literature due to their theoretical complexity. We hope that by providing 
a different approach to the theory of lU; 0; Ell], we will encourage researchers with more 
applied interests to use the theory. Below, wherever possible, we use the notation of Zahle et 
al. 

Let be a two dimensional torus of width W corresponding to the stepping stone model. 
In the stepping stone model we may think of the blocks in nss(0 as coalescing random 
walkers on moving with rate m. Given two random walkers on let 7b be the first time 
the two walkers occupy the same deme. Let to be the time at which the two walkers coalesce. 
From a technical perspective it is simpler to consider a single random walker moving at rate 
2m than two random walkers moving at rate m. When we consider a single random walker 
we let To be the time at which the random walker hits the origin (0,0). To consider fo we let a 
coalescent event occur at rate ^ when the walker is at the origin, then to is the time at which a 

coalescent event occurs. We let pj'*' (0) be the probability of an event for a random walker 
starting in deme x and moving at rate w. We let p["'\x,y) be the probabiUty that a random 
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walker starting at jc and moving at rate w will be in deme y at time s. Finally Pj:(0) =fi''(0) 

and ps{x,y) = pi^\x,y). 

Before proceeding, we state some technical results concerning random walks on T^. 
These results can be found in UQl, we refer the reader to those works for the proofs. 

Lemma 5.6. For t < eW^logW, 

limP(o,i)(ro >t) = ^(l+0{e)). (5.18) 

If\x\=o{W) then 

lim/?v(x,0) <c4- (5.19) 
If \x\ — > °°, \x\ — oiW) and s < x^ then 

Ps{x,Q) < C ^ ^ . (5.20) 



If tw ^ °° then 



1 



W'\Pr„w4x,y))^^\^0. (5.21) 
If s — > oo and s < CW then 

C 

limp,(;c,0) < -. (5.22) 
s 

Set Tscat = j^- Recall that Ajampie — yj^^ is the minimum distance between sampled 
demes. 

Lemma 5.7 (Independence Condition). Let be the event in which two individuals from 
separate sampled demes coalesce before time T^cat- Then, 

Proof. We can consider a single random walk moving at rate 2m that starts at position x with 
|.^| > ^sample- Let 5 — We compute P^"^\Tq < T^cat) by considering the last time the 

walker is at the origin and rescaling time by 2m: 

P^^'"\To < Tscat) = dsp,{x,Q)P^oA){To >W^~s) (5.24) 

< dsps{x,0)P^oi)iTo>{l-8)W^)+ dsp,{x,0)P^Qu{To>W^^s). 
Jo Jsw^ 

Consider I\ . Using ( 15.181 1 and ( |5.19l l in the expression for /i gives, 

h < -^f^ = 0{5). (5.25) 
logW 
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Now consider /2. Using ( 15.18b and (15.221 1 gives 



^2 < m / , dsP{T(, >W^~s)< (5.26) 
oW^ Jsw^ dlog(W) 



Combining ( 15.25b and ( 15.26b gives 



Pi^"\To<T,)^0{-^). (5.27) 



Considering all possible pairs finishes the proof. 
Lemma 5.8 (Short Scattering Phase Condition). 



□ 



P{mutation before T^cat) = 0{-^ — —). (5.28) 

Proof. There are at most nd blocks in the time interval [0, T^cat]- Then, 

nd nd 

£ [number of mutations in [0,7;cat]] < MH)7'scat = 0{ — ) = — ). (5.29) 

Nm logW 

□ 

Before demonstrating the KC Condition we prove some preliminary lemmas. First, we 
show that at time Tscat the blocks are far apart from one another Define 

r{k) ^{ne ^'^^ : |;r| = ^, if (Eugi), {£2,82} e ^ then \g, ~g2\ > -} (5.30) 

(logW)2 

Lemma 5.9. Let M ^ \UssiT,cat)\- Then, 

P{nss{Tsca,) i r(M)) = O(^) (5.31) 

Proof. Given two random walkers y\^y2 starting at some arbitrary displacement x, by ( 15.22b 
we have 

w ^ 1 

^(bl(rscat)-}'2(rscat)|>- r)< E -). (5.32) 

(logW)2 |.,|^ ^ log^ 

(logW)2 

Considering all possible pairs gives the result. 

□ 

For Lemmas l5?T0ll5.12l we set Ar = £{^^'W'^\o%W and Af = 2mAf where £ = (^Vt^^Ny 
For the sake of clarity we keep certain expressions in terms of e. The results stated in Lemmas 
ISlQl and lSTD can be found in Q. 
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Lemma 5.10. If\x\ > — ^^^-j- then 

(logW)3 



'"'(7b < ^t) = e(l + (9(ei)), (5.33) 

where 

loglog^ (5.34) 

I — I 

Proof. In [ 1], Cox showed that once two blocks are sufficiently far apart, the time it takes the 

pair to enter the same deme is exponentially distributed with mean ^ ^Mm^^ • approach 

will be to divide time into intervals of size Af = ^YL.^3S£l Wg will show that during a time 
interval Af, the probability of two blocks entering the same deme is approximately £. 
By the same argument as in Lemma lS^ we have 

P^"'\Tq <M) = j\sp,{x,Q)P(a.x){To >Kt-s) (5.35) 

dsp,{x,0)P(oi){To> At-s)+ / dsp,{x,0)P(oi){To> At-s) 

^ ' ' JfW^iogiogiy 

/■Ar 

dsps{x,0)Pio i){To > At-s) 

JeW^s/TogW 

= h+h + h- 

We first show that I\ has small contribution. Using (15.181 1, ( |5.20| i, and ( 15.221 ) we arrive at, 

1 /-EW^loglogW I loglogW loglogW 

h=0{^—){\+ ds-)^o r^^ )^eO{ ). (5.36) 

Now consider /2. Using ( 15.18b and ( I5.22l i gives 

/logloglVX 

l2 = e . (5.37) 



Now consider /3 . By using (I5.18l l and (15.211) some manipulation of the integral gives 

I,^e(l + Oi'2f2^)). (5.38) 



logW 

Putting ( |5.36t , ( 15.371 1, and ( 15.381 ) together gives the result. We pause to note that if we con- 
sider At — we would have arrived at the same asymptotic result. That is, 

P,{T,<At- ^°g^°g^ ) = £(l+0(.i))- (5.39) 



Lemma 5.11. 



where 



□ 



Pg;; (Jo < A? ) = ^ + 0{e2) , (5.40) 

loglogW 
\ogW 
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Proof. Recall, to compute Pj^QQ^ito < At) we consider a random walker moving at rate 2m, 
with the stipulation that when the random walker is at (0, 0) there is a coalescent event at rate 
^. So we may characterize the behavior of the random walker through the random variables 
H,ti,t2, ■ ■ ■ ,tE,uo,ui, . . . ,ue+i where H is the number of excursions taken by the random 
walker away from zero before a coalescent event occurs, f i , . . . , f// are the time spans of these 
excursions and mq, . . . ,ue+i are the time spans spent at the origin between excursions. H 
is geometric with success probability i^2Nm ' Each uq,ui, . . . ,ue is an exponential random 
variables with mean of order A^. 

We first consider the distributions of the f,-, clearly the f, are i.i.d. We distinguish between 
three types of excursions. Set K = logW and define 

Typel:f,e[0,^]. 
Typen:f,e(^,Af]. 
Type III ■.ti>At. 

By (15.181 1 we have 

P(Type I) = 1 - 0( / ) (5.42) 

logAf-log(^rA^m) 

P TypellHQ r j 
(logAf)2 

In 

f (Type III) -> 1 



log(A/) 

In the following we ignore the time contributions of the m,-. Including the m,- does not change 
the argument much, the order of the error terms stay the same, and so we drop the m, for the 
sake of clarity. We first show that the probability of experiencing a Type II excursion before 
the coalescent event is small. The probability of a coalescent during any given visit to the 
°"gin is j^jf^, = <5(]^). The probability of a Type II excursion is y^^f (Type II) = 

o C°il^°^y. )- Then taking the appropriate ratio gives, 

log log 

f (type II excursion before coal.) = 0{ ) (5.43) 

logW 

We now show that if no type II or III excursions occur then we coalesce with very high 
probability. Indeed, if no Type II or III excursions occurs then we will coalesce before time 
At if we coalesce before there are KNm Type I excursions. The probability of not coalescing 
for KNm Type I excursions is 

2Nt71 \^^'" / 1 1 \0(logW(Wm)) J 



So up to eiTors of order we can reduce the computation of P(o,o)(fo < ^0 to the 

probability that a coalescent event occurs before a Type III excursion. Computing the relevant 
ratio then gives, 

1 I02I02W 

P(coal before Type III) = + 0( f ^ ). (5.45) 

^ ' l + a ^ logW ' 
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Putting all this together gives the result. Finally we note that this result would hold if we 
replaced Af by ^'^-'°j'°g'^ . That is, 



Lemma 5.12 (KC Condition). 

|ni-i(7',„r)| k 

E E 

k=2 ;=1 
For jj' = 1,2,... ,k with j ^ f 



E[Nj{k^k-\)]--^ 



e l + a 



a 



□ 



(5.47) 



^ ll-fc(^-l) inf P{Ui{k)=j,U2ik)=j')\^0 (5.48) 

<.=2 JJ=i,--k;j7^j' 



w 



Proof. We would like to combine Lemmas |5 . 1 01 and |5 . 1 1 1 to show that for |x| > p, 

(logW)! 

p!^"'\to<At) = Y^ + eOiei). (5.49) 
By using Lemmas |5 . 1 01 and |5 . 11 1 we have 

P,{to < At) < P,{To < At)P(ofl){to < At) = + eO{ei) (5.50) 
A lower bound is provided by using ( |5.39t and ( 15.461 1: 

Px{to < At) > P,{To < Ar-ew2loglogW)P(o,o)(ro < eW^loglogW) (5.51) 
^ -eO(ei). 



l + a 



This proves (|5.49l l. Up to this point we have limited ourselves to interactions of two blocks. 
Now, however, we consider nss(O) Gr(^). First we compute P(nss(Af) ^ r(^— 1) ur(^)). 
There are two ways for this two occur Either two blocks out of the k are within — ^^—r but 

log(W)2 

have not coalesced at time At or two coalescent events have occurred. We consider the first 
case. Let yi and y2 be random walkers moving at rate m that start x units apart. Assume that 
if yi and y2 enter the same deme then they immediately coalesce. Let yi ,y2 be independent 
random walkers that do not coalesce. Let be the event in which yi and y2 do not coalesce 
but are within — ^^-^ units of each other at time At. We have the following bound, 

log(W)2 

W 

P{@i) < PilM^t) -y2(Af)| < r) (5.52) 

l0g(W)2 

To prove this inequality we use a coupling argument. Couple yi to yi and y2 to y2. By this we 
mean that the pairs move together. However, if yi and y2 coalesce then we decouple the two 
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pairs and yi and y2 begin to move independently of yi and y2- No path in will experience 
a decoupling, so the inequality follows. We now bound the right side of (I5.52I) . 

PiMAt) -nm < -^-t) < L /'if (^,^) (5.53) 

log(lV)5 |.|^ yy 

, ,2 C ^, 1 , 

< { ) = Oi ) 

" log^T 4ogW^- 

where we have used ( |5.21t to achieve the inequality directly above. 

Now we consider the possibility of two coalescent events during time Af . By the same 
methods as just described, we can show that if a single coalescent event occurs at some point 
in time Af, then with high probability all blocks will still be more than , ^ units apart. 

-y'log{W) 

Then we repeat the argument and are able to show that the probability of two coalescent 
events is of order (^(e^). So finally we have after allowing for all possible pair combinations, 

P(n(Ar) i Tik- l)um) = 0( + ^V) = eOi^). (5.54) 

From (15.49b the probability of a coalescent event between any two blocks is -j-^ + 
eO(ei), giving 

P{Yl{At)er{k-l))=Q^ + eO{k'e), (5.55) 
P{n{At) e m) = 1 - Q + eOik'e), 



where 

log log W 



If we consider coalescent events after time Tscat we have. 



(5.56) 



This then gives, 

e A+a 



E[Njik-.k~l)] = -j-{—-)(l+0{k^e)). (5.58) 
[2) " 



Summing over j = 1,2,. . . ,k and then summing over k — 2,3,... ,nss(7"scat) gives 



|nss(7;cat)| k 

L L 

k=2 i=\ 



E[Nj{k^k-\)]-—{^^) 



e l + g. 



< 0(|nss(rscat)re) < Oiindye) ^ 0. 

(5.59) 



Using (15.54b and (15.55b and the same argument as in Lemma l53] gives (15.48b . 



□ 



Lemmas I5.7H5. 121 prove (13.4b in Theorem |2] Finally we characterize the distribution of 
nss,i:(7scat)- The result stated in Lemma l5.13l is very similar to Theorem 3 in Il33i1 . and our 
proof follows the methods introduced in Lemma 15.121 so we simply sketch the proof. 
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Lemma 5.13. 

Tlss.k{T,car) ^ nj,~'(log(i^)). (5.60) 

Proof. We partition the interval [0, T^cat] by the points f^. such that f^. — W^^P where < p < 1 . 
We eventually select p to optimize our error terms. We will asymptotically compute the 
probability of a pair coalescing in the interval [f<^,f<:+i]. Further, we will show that at the end 
of this time interval, the blocks are always separated by a significant distance. The first time 
interval, [0,t\], is special as we start with n blocks all in the same deme. 

To make all this precise, we introduce the following notation. If ;r G then n G 
H^p\i) if |;r| = j and every pair of blocks in n is separated by a distance of a least -^=^- 

Now suppose that for some k such that 1 < < ^ we have Il{tk) G //p*' (./'). Then by the same 
techniques used in Lemmas 15.101 and 15.111 we can show that the probability of two blocks 
entering the same deme is approximately j, and once two blocks are in the same deme, the 
probability of coalescing is approximately (1 + ;^^) 
Since |n(f^)| = j, we have the following results 

pmk+i) e Hf+'\j)) - 1 - f^) (^)(y^), (5.61) 



kp 



p(n(f,+i) G Hf+'\j 1)) ^ (i)(T^); 

For the interval [0,fi] things are a bit different as we start with n blocks that all occupy the 
same deme. But in this case we can show the following 

P{Il{h)eH'^p\n))^\. (5.62) 

From the above computations, we note that up to vanishing error terms, each pair of 
blocks in Il{tk) is equally likely to coalesce in [tk^tk+\]■ Now we can compute the probability 
of no coalescent event occurring up to time T^cm- 



P{no coal, by = n 1 - U WMt-^) (5-63) 




This computation can be easily generalized to the probability of a coalescent event between 
any two time points in [0,rscat]. The probabilities are recognized as precisely those of the 
coalescent probabilities of the Kingman coalescent run to time logit^. The result then 
follows. 

□ 

Finally we note that using Lemma l5.13l and standard Kingman coalescent results [6J we 

E\B~] 

can show that 0. 
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6 Fst under Strong Mutation 



00- 



The theorem will follow from 



In this section we prove Theorem[3] Recall Fst = 

two observations. First ^ and second, V [^o] ^ 0. More precisely the next two lemmas 
describe the behavior of and ^o- 



Lemma 6.1. 



lim lim£[^i I ^1 7^ 11 =0 (6.1) 



Proof. We start by considering simply ^[^i] rather than £[^1 | 0i ^ 1]. 

^['?'i] = 472 E E[I{xk.j=Xk,.r)] (6.2) 



k,k' = \j.i'=\ 

,1 
'1' 



where ki 7^ ^2- By the definition of a G/KC coalescent and the properties of a Kingman 
coalescent Xj^^ i and Xj^^ i coalesce at rate 1 while a mutation occurs at rate r. This gives, 

E[I{xk,.i=Xk^,y)] = -^ = 0{-). (6.3) 
1 + r r 

This gives £[01] -^0{\). Since 

= £[(/)! I ^ ^ 1) = 1), (6.4) 

we will have £[^1 | 0i 7^ 1] ^ 0{^) if we can show P{^\ = 1) ^ 0{\). But note 

P((?)i = l) <£[(/)!]. (6.5) 

Taking lim^-^oo finishes the proof. 



□ 



Now we show that 0o approaches a deterministic value. 
Lemma 6.2. 



lim lim 0o 



= S(2) (6.6) 



Proo/ We first show that V[<^] 0. 



cl 



\k=l 



1 1 

-2 E Cov(0o,i:,0O,i:')+C(;7) 



k'.,k"=l.k'^k" 
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For k' 7^ k" we have the following relation 

Cov(0o,i,0O,A-') ^Xki^2)I{Xk".l ^Xk".2)] (6-8) 

-E[I{xki,i =Xi/.2)]£'[/(xi".i =JCF.2)]+0(-). 

Now we use a coupling argument introduced in lEoll . We sketch the coupling argument and 
direct the reader to [20] for further details. Let n(f ) be a G/KC coalescent started with the 
following four individuals in separate blocks: -«jt',i ,Ji;i(:'.2i-^*:",i i^F,2- Now define two G/KC 
coalescents n* ' (f ) and H* " (?) started with the following individuals jc^, ^,x*^j , and x*^„ ^,x*^„ 2 

respectively in separate blocks. We couple n(f), H*' (f), H*' {t) as follows. At the outset, 
the block contain each x is coupled to the correspondingly indexed x*. By this we mean that 
the two blocks experience the same coalescent, migration, and mutation events. If a block 
in n(f ) containing a kf indexed x coalesces with a block containing a k" indexed x then we 
say that a decoupling has occurred. Once a decoupling occurs, the three coalesents evolve 
independently. Set 

/= (l{xi,, i =X,,i2)-I{xl,^ =%^2)) i^K^k",! =Xk",2)-I{Xk"A =4",2)) (6-9) 

Observe, 

Covi(^o.k,(po.k')^E[I] (6.10) 

Observe further, if a mutation or coalescent event occurs before the decoupling coalescent 
event then I — 0. We have, 

/'(decouping event before mutation event) <4£'[/(x^/ 1 =x^",i)] (6.11) 

These two observations give 

Cov(0o,/i,0o,/i') <'E'['^(-^i:',i =-^i:",i)] ^ <5(i), (6.12) 

where we have used i6.3i to obtain the result directly above. Plugging (|6.12| l into ( 16.71 1 gives 
y[0o]-><5(i). Now note 

£[(/)o[ =£[0o,i] =£[/(xi,i =xi.2)]+0(-). (6.13) 

n 

If Jci.i ,xi,2 occupy the same block in IIg/kcCO) then we will have jci i = xi 2. Otherwise, by 
arguments given in Lemma |6T| we will have, with hmiting probability 1, xi 1 7^ xi 2. It then 
follows by the definition of S that 

£[/(xi,i =xi.2)]^S(2). (6.14) 

Finally, recalling that f (0i = 0{j) from the proof of Lemma 16. 11 leads to V [0o | 01 7^ 
1] ^ 0{j) and £[00 | 0i 7^ 1] ^ ^(2). Taking Um^^oo finishes the proof. 

□ 

Since Fst E [0, 1], Theorem|3]is proved in a straightforward manner using Lemmas l6.ll 
and|62] 
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7 Fst under Weak Mutation 



The goal of this section is to prove Theorems |4l|6] Recall that in the weak mutation setting 
we assume that there is a single mutation on nG/Kc(0- We assume that the mutation occurs 
when |nG/Kc(OI = ^- More precisely, we select a block iimut uniformly from nG/Kc(7L) and 
mutate all individuals in /Smut- Label the blocks of ITg/kc,*: (0) as i , Ei^2 , Ekfii- ■ We refer 
to any x^j & fimut as a mutant. 
Set 

Rk=Y.X{Emutr)Ei,,j^(ll), (7.1) 

d 

k=l 

Rii and R are the number of blocks in nG/KC,<r(0) and Hg/kcCO) respectively that contain 
mutants. Note that if a block at f = contains a single mutant, then every individual in the 
block must be a mutant. 

At f = 0, each J^ii is the disjoint union of blocks. Of these Bi^ blocks, Rj^ will contain 
mutants. We refer to these Ri^ blocks as mutant blocks. By the symmetry of the G/KC 
coalescent, which it inherits from the Kingman coalescent, the mutant blocks are equally 
Ukely to be any subset of the B/t blocks. Let CT{k,-) be a random injective map from [1, . . . ,Rii] 
to [l,...,Z?jt]. (y{k,-) is chosen from the uniform distribution of all such mappings. Now 
define 

Rk 

^k=Y,^k,a(kJ)- (7.2) 

1 

" k=\ 

P2=^-;i{Ak)'. 

" k=i 

Fs, = (7.3) 

Pi- Pi 

We will often speak of the descendants of some block E E Hg/kcCO- '^^is we mean all 

Ei G nG/Kc(0) with Ei C E. We write {B,} for {B,},=i d- Below we let £/{a,b) be the set 

of all injective maps from [1 , 2, . . . , a] to [l,2,...,b]. 



Simple algebra gives 



7.1 Some Preliminary Results 

We first characterize the distributions of ^ . The LPLS limiting distribution of ^ depends 
on limLPLsi' and k. We have three cases. Define 

V = exponential random variable with mean 1 . 

W{z) = r.v. with density (1 - \){\ - ^)--^ forz > 2 and < x < z. 

G{z) = geometric random variable with success probability z- 
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Then we have the following result. 
Lemma 7.1. 



lim — 

LPLS B 



V ifK = 0, limLPLsL °°, 

W(L) ifK = 0,limLPLsL<oo, (7.4) 
B ^ k{G{k) + 1) ifK^O. 



Proof. Before proving the three cases we show that ^t^-t^ 1 . Indeed, by our assumption 



E\B ] 

of -> in the LPLS limit we have 



, 1 , , E\Bi:] 

V[--E[Bi]]^-V[Bi]<^^0. (7.5) 



We can then conclude 



L LB L 
K = lim — : — — lim — r = lim — . (7.6) 

LPLsE[Bi]d LPLS BdE[Bi] lpls B 

Let 71,72, ■ ■ ■ ,7l be the number of descendants from each block in TIg/kc{Tl)- A standard 
result, see for instance |@], is 

Pijij2,---JL\B) = -^ (7.7) 
By symmetry we may set R — ji. Then elementary combinatorics gives 

/B-R-l\ 



P{R\B) = ^-^. (7.8) 



Now we consider the three cases stated in the lemma. For simplicity of notation let 



First take K = 0,L ^ o°. In this case since K" = we have § ^ 0. 



lim < Z < /7) = lim T ^ ' . (7.9) 

LPLS \ - - ^ LPLS ■^fi (B-^\ 



bB 

lim V h:il(l-J—)L-^E{L,R,B) 
LPLS ^„^B-V B-l' ^ ' ' ^ 



where 

llj=l ^ B~R-1 



E{L,R,B)^ (7.10) 



A standard argument then shows, since 4^0 and L^°° that. 



Um 

LPLS 



rb 

P{a<Z<b)^ / c/jcexp[-x]. (7.11) 

J a 
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In the case K" = 0, liniLPLS^ < we can use ( 17. 9t to show that Z^W{L). Now consider the 
case K>Q. Taking e > 0, 

limP(Kj-e<Z<Ki + e) = limP(R = k) = ^-^^ (7.12) 

LPLS - - ^p^g i^B-l\ 

Now expanding the binomials directly above and taking the LPLS limit gives that R—l goes 
to a geometric random variable with success probability K. The lemma follows. 

□ 

Lemma 17.11 shows that ^ has three different limits depending on the scaling of L that 
we choose. In each case we want to compute the LPLS limit of the mean and variance of pi 
and p2 conditioned on This however is technically cumbersome because prior to taking 
the LPLS limit, ^ is discrete. Furthermore, if fc > 0, the LPLS Umit of ^ is discrete. To 
deal with all three limits of ^ simultaneously, and to avoid unneeded technical difficulties 
but on the event ^ G J^^ for certain sets .^^f 



we condition not on but on the event ^ G J^^ for certain sets . More precisely, let 
e > 0, then set 



[/ie,(/i + l)e)for/! = 0,l,2,... if k- = 0,limLPLsi ^ °°■ 
[/^e, l)e) for /i = 0, 1,2, § if K- = 0,limLPLsi<°°. (7.13) 
{hK-£,hK + £) for /! = 0, 1,2, . . . if fC 7^ 0. 



Lemma 7.2. Let i be a positive integer with i < B^. Then, 

]i„^£((«.') I <(';,')(> ,7.14) 



For k ^k' and i, i' positive fixed integers, 



bI+bI rI+rI 



hm£[«, I = lim£K I 

LPLS LPLS D — K K 



(7.16) 

Proof. We demonstrate ( 17.141 1 and ( 17.15b . the proof of ( 17.161 1 is similar We choose R mutant 
blocks out of a total of B possible blocks. Each collection of R choices is equally likely, so 
we have 

P{R,\R,{Bi}) = ^^iljl-^ (7.17) 

From the relation directly above one can show 



PiRk\RM)<{ l i-) (7.18) 



and 



, , , ,Bk\fRY'f rV'-'^' ,RI Bj Rj 

PiR,\RAB^})={j](-] (l--) + ^ + (7.19) 



These two relations give ( 17.141 ) and ( 17.151 ) respectively. 



□ 



29 



The following lemma will be used to control the error expression produced in Lemma 
Lemma 7.3. IfK = and MvciipisL < °° assume /i 7^ | , | — 1. 

lim£[-^ + ^ I ^e^.f]=0 (7.20) 
LPLS ^B-R R ^ B ''^ 

Proof. Let // = + ^. We have, 

H=?±( i^V-- (7.21) 

Bll-f(l)i R 



From ( 17.141 ) we have 



I R,B] < ^{^)Hl + 0{^)) = 0{^). (7.22) 

By our assumptions on h we have lim sup < 1 . So we arrive at, 

RT Rl 

E[H\—^ < 0[E[^ I — e ^/f]). (7.23) 

We now write out the conditional expectation explicitly. Without loss of generality we take 

k=\. 

B\ RL LnABr)BlLs, - I 

But now we note that by Lemma ItTI G J^/f) is asymptotically independent of B. So 
using ( 17.241 ) we have 

lim E[H\—e ^,f] = lim E[^] = lim = 0, (7.25) 

LPLS ^ ' B ''^ LPLS ^B^ LVL5dE[Bi\ 

□ 

We will need to compute the moments of products of bk j. The following lemma shows 
that such moments can be expressed in terms of the scattering probabilities. In general we will 
be computing products of bk,j for uniformly selected j over 1, . . . ,Z?i. To make this precise 
let / be a positive integer and let 7 be a random element of £/{I,Bk) under the uniform 
distribution. We have the following lemma. 

Lemma 7.4. Let I,ji,j2i be fixed positive integers with each ji unique. Set J = j\ + 
72 H 1- ji- Then for B^ > L J < n. 



Bk\f J ^ ' 



(7.26) 
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Proof. If we sample J individuals from ^j^, then Z{ji,j2, ■ ■ ■ ,ji) is the probability that the 
blocks Eli i,Ek 2, ■■■ ,EkB^ partition the 7 individuals into / sets of size y'l ,72, ■ ■ ■ ,7/- Taking 
ordering into account, there are J^-{j) ways to sample J individuals from There are 

(ni"/;";' )! ^^Y^ ^ssign ji individuals to block £<. /,. With this in mind, if we consider all 
possible combinations, we arrive at 



1 



Since we fix 7, taking the LPLS limit gives the following asymptotics 



bk,l,t'k.2,---'bk.Bi- 



= lim 

LPLS 



(7.27) 



(7.28) 



Noting that T.yes/j = ({Bk-i)\ ^^^'^^ 



= lim /! 

LPLS 



ye.^(i.Bk) 



l>k.l^b^2,-;bk.Bi 

If we now condition over rather than b/^ 1 , fo^t 2 1 • • • i ^a-.Bj^ we have. 



'^UiJi, ■ ■ -Ji) 









= lim II 




Bk LPLS 




\J\,]2, ■■■,]!/ 



i=\ 



r(0 



(7.30) 
□ 



Finally, we show that the distribution of E depends very weakly on Bj^. 
Lemma 7.5. With the notation and conditions of Lemma U .4\ 



^e^,f] = S0-i,...,;7). 



(7.31) 



Proof. The proof of this lemma is very similar to that of Lemma lTJl The existence of a limit 
for P{J^^) allows us to remove the conditional dependence on 

□ 



7.2 PI 

Now we consider pi conditioned on 

Lemma 7.6. If k = Q and limipisL < °° assume h ^ j,j — I. 



lim E[Lp I I :^e^/]e^/f 

LPLS B 



(7.32) 



31 



Proof. Using the fact that Rk,bk,a{kj) independent when conditioned on B^, we have 
E\pi I R,{B,}]=E[Ak I -£[1^ V(M I ^'i^i}] (7.33) 

= £[^£[/7,,^(,,,) I Bk] I ^E[RkE[bk,aik,i) I B^] I 

.7=1 

Applying Lemma WA\ with J = I = 1, noting E(l) = 1, and then applying Lemma l7T2l leads 
to 

£[Lp, I = -m I = — + + -f). (7.34) 

Now if we condition both sides of the above equation with respect to J^if and apply Lemma 
I7.3l we arrive at the statement of the proof. 

□ 

Having computed the conditional mean of Lpi on J^^, we now consider the conditional 
variance. 

Lemma 7.7. If k = and YimipisE < °° assume h^^,^-l. Then, 
Proof. We start by considering £ [L^p^ | R, {-S,}]. 

,2 d 



E [L^p\ I R, {B,}] = §^ L ^ [^kAk' I R, {Bi}] . (7.36) 



k,k' = l 



So we need to compute E[A\ \ R,{Bi}] and E[AkAy \ R,{Bi}] for k ^ k' . Starting with 
E[Aj. I /?, {Bi}] and expanding out A^t gives 



E[Ai I R,{Bi}] =E[Rk{Rk-l)E[b,^^i^k.x)bk.a(k2) I Bk] I RABi]] (7.37) 

''k,a{k,l) 

Using Lemma l734l gives. 



-E[RkE[bi,,,,ABk]]RABi}] 



Bk{Bk - ^)E[bk,a(k.\)bk.ak.2 I Bk] 2(1,1) 
BkE[bla^,,,)]Bk]^m 



(7.38) 
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Plugging ( 17.38b into ( 17.37b and using Lemma l7!2l gives 



}i^E[At\R,{B.}] 



(7.39) 



: lim 

LPLS 



E[Rt{Rk - I) \R,{Bi}] +Z{2) 



—E[Rk\R,{Bi}] 



R ,R 



B. 



Bk 



B 'B' 'B-R R 

(?)o(A + l); 



^(1-^ , 

B^ B' 'B' 'BR R 



where we have used the relation E(2) 



= 1-2(1,1) 



Bk 



to arrive at the final equality. 



Bk 



Now we turn to E[AiiAi^i \ R, {B,}] for k ^ k' . An argument similar to the one just finished 
fovE[Al\R,{Bi}\ gives 



E[A,A,, I R, {B,-}] = ilf + ilfoi^ + % 



'B' 



'B' B-R R 



Plugging ( 17. 39b and (17.40b into (17.36b gives 



E[L^pi\R,{B,}]^i^f + ^Zi2) 



Bk 



(^)(i_^) + (:^)2o(-^^- 

^ B b' ^ B ' ^B-R R 



Using ( 17.34b we can express the variance as follows, 



V[Lin\R,{B,}]^ -Z{2) 



Bk 



,RL^ 



Bi 



'B-R R 



We then condition on and use Lemmas |7.3| and 1775] to arrive at 



(7.40) 



(7.41) 



(7.42) 



limy[Lp,|f:e^,f]=Hmj£(2) 



£[(f )(l-(f^)^)lf^G^/f]<t>(A). (7.43) 



□ 



7.3 p2 

As we did in the previous section for pi, in this section we compute the mean and variance 
ofp2. 

Lemma 7.8. If K — Q and \\mipisE < °° assume 7^ f , f — 1- Let x G Then, 

v2 



lim £[Lp2 I — e = y + S(2)(x)(l - j)+0{e) 

LPLS tS L, L, 



(7.44) 



Proof. We have pi^^ iLi ■ The result then follows from ( |739l ). 



□ 
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Lemma 7.9. IfK = and limipisL < °° assume h ^ j,j — \. 



, RL 

limV[Lp2 I — e 



LPLS 



B 



0(A). 



(7.45) 



Proof. We sketch the proof as it is very similar in technique to Lemmas 17.61 and 17^ Using 
(17.161 1 it is not hard to show that for k ^ k' , 

E[AlAl I =E[Al I RABi\]E[Al I RABi}]+0{^^^ + ^^^). (7.46) 

Since asymptotically the Ai, are uncorrected, the variance of Lp2 reduces to the variance of 
LA^. Ignoring error terms this gives, 

,2 d 



V[Lp2 I R,{B.}] = -rLiE[At\ R,{B,}]-E[aI \ R,{B.}]' 



k=\ 



From ( |7.39t we have (again ignoring error terms) 

\r 



E[At\R,{B,}Y 



-(1--) 

B^ b' 



(7.47) 



(7.48) 



Using Lemmas |7 . 21 and r7~4l as we did in Lemma lTTT] gives 

E[At\R,{Bi}]=Oi^). 



Plugging ( 17.48b and ( 17.491 1 into (17.47b gives 

V[Lp2\R,{B.}]^0{i 



RLE 

B 'd' 



Oil). 



(7.49) 

(7.50) 
□ 



7.4 Limit of F« 

We can now put together the results of sections|T2]and|23]to demonstrate Theorems|4]|6] We 
start by proving Theorem|4] 

Theorem^ We will consider Est conditioned on ^ e J^if as e ^ 0. All the lemmas devel- 



oped in sections uTI] and iTJl include the assumption that if fc = and limLPLsi- < °° then also 
/; ^ |, I - L But as £ ^ 0, € J^/f ) ^ for these values of h. With this in mind, for the 
rest of this proof we assume that h does not take on these excluded values. 
Rewriting (17.3b gives 



F,, 



Ep2-{Lp\) 



2 1 



Lp,~{Lp,Y{ 



(7.51) 



Now note that by Lemmas l7.6H7.9l since A = limLPLS § = 0, the means of Lpi and Lp2 go to 
non-zero limits while the variance collapses. If we plug in the mean values for Ep\ and Ep2 
we arrive at 



lim F^t 

LPLS 



2(2) + 0(e) 



(7.52) 
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Since the limit is independent of h and since F^t is bounded a dominated convergence theorem 
argument shows Fst ^ E(2). 

□ 

The proofs of Theorems |5] and |6] are harder and require some preparation. The following 
lemma simplifies the expression for F,,. 

Lemma 7.10. For A > 0, 

lim F„ = lim — (7.53) 

LPLS LPLS pi 

Proof. We have 

P _ P2-P\ P2 , P\ .7,.. 

Pi- Pi Pli^-Pl) ^-Pl 

Now note that by Lemmas l7.1l and lT6l ^[Lpr] OO. Since L = Xd we have pi 0. 
Using this observation in (17.541 1 finishes the proof. 

□ 

Before stating the next lemma we define the random variables b{z) and s. s is given by 
the following distribution. For i— 1,2,3,..., 

Pis^i) = ^'^;=^\ (7.55) 
^ ' E[Bi] 

Now we define b. Let 77 be a uniform random variable on {1,2, ... Then for a,i> e [0, 1] 
P{b{z) e [a,b])=P{bi.r^ e [a,b]\Bi -z). (7.56) 

So b{z) is the relative size of a block uniformly chosen from z blocks that partition The 
following lemma expresses Fsi in terms of b{s). 

Lemma 7.11. Assume X > 0. Define 

Y = { '^-^ = (757) 

I G{K) + \ ifK^O. ^'-^'^ 

Let bi,b2, ■ ■ ■ be independent versions of b and si,S2t ■ ■ be independent versions of s. Then, 

Y 

lim/72 = lim V (7.59) 

LPLS LPLSJ^^ ■' 

lim F,r = lim ^>='^>'^^^-' (7.6O) 



LPLS LP^Vj=lbj{Sj) 
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Proof. We start by considering pi and p2 conditioned on B. To simplify our index notation 
let ^ 1 , /72 , ■ ■ ■ , be some ordering of the collection b^.j for k= l,...,d and j = 1 , . . . , Bj. • Let 
^{k) be the sample deme associated with bj^. That is, if bf, is the reindexed version of bj^ j 
then C(/^) = k. 

If we condition on B, pi and p2 are specified by choosing R blocks out of the B possible 
blocks, where each subset of R is equally likely. Then we can specify pi through (recall the 
definition of £/ immediately after ( 17.31 )) 

1 ^ 

Pl = -iJl^fih)^ C^-^l^ 

" h=l 

where f is a random element of £/ (R,B) under the uniform distribution. Now weletgi,. . . ,gR 
be uniform r.v. on [1,2, ... Then we claim limLPLS Pi = liniLPLS jjllh=i ^gir 
through a coupling argument. We select gi,g2, ■ ■ ■ ,gR- If each one is different, then we 
define f{h) — gi,- If some gi — gji, then we select / according to its (uniform) probability 
distribution. We would like to show that the probability of uncoupling goes to zero in the 
LPLS limit. 

^R\ 1 ,RL, 1 



^(uncoupling | R,B) < I )^ ^ ^ T^Z?" ^^'^^^ 



Lemma ItTT] show s that limLPLS ^ 



exists and is independent of B and since L — > oo we have 

B 



/^(uncoupling | B) ^ 0. (7.63) 

which implies 

R 

lim p] = lim b^, ., (7.64) 

LPLs' LPLSj^j 
R 

lim P2 = lim V b^ . 

LPLS LPLS 

Now we show that we may replace the R by Y. We restrict our attention to the case k" = 
and consider pi only. The case K" 7^ is much simpler since R converges to a geometric 
distribution, and the analysis of p2 is similar to that of pi. We first show that we can replace 
/jby F'= [(f )^]. 

(7.65) 



Finally we would like to show that we can replace Y' hy Y. To do this we recall that we have 
split [0,°o) into intervals By LemmaO P(^ e J^,f ) P{V G J^,f ). So we have 



E{Bi] 

E[\ L - i: I I ^. § € ^/f ] <E[t b,^] < e^E[b,]. (7.66) 
j=i j=i ^ j=i ^ 
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Now note that E[bg \B\ = £ [ 5 Lf= \bj \B\ — j . Plugging this observation into the inequahty 
directly above gives 

^[| L - i I V, ^ e J^] < E[^^] - e. (7.67) 



7=1 j=i 



Now taking e to zero shows that we can replace F' by F. 
Now we would like to show 



Y Y 



To do this we compute the LPLS limit of the characteristic function of 52^=1 b^^, "/(v). Recall 
that ^(g) is the sample deme to which h^, is associated. 

V/(v) = £[exp[/vZ7,]]'' = (,) = ;)^?[exp[/vZ7,] |Bf = ;] j . (7.69) 

If we condition on 81,82, ■■■ then 

pfp _ .■ I r D 1 ^ _ iLi ^ = J)J _ 71 ^k=i ^ i^k = j)j 

^y^as) - J I i^'i) B ~ ifi ^ ^ ^ 

d 

Now note that ^ {8^ = j) are i.i.d so by law of large numbers ^ L^=i ^ (^*: = 7 ) 7 ^ ^"(^1 = 
y) 7, while limLPLS cIE^Bi] = 1 ■ So defining 5; through the following relation 

£t,/(«,^./); ,pa^(,^,(,„_ 

and 5 (y) 0. Plugging ( I7.71l i into (17.701 1 and then plugging the result into ( 17. 69b gives 

"^(^) = {ii^^^^i^ + 5(;)))£[exp[.VZ.,]|Bf(,) = j]^ (7.72) 
1 



£[Z?i(l + 5(Bi))exp[/vZ7,]|Cfe) = i: 



We now expand exp[ivbg] in Taylor series. From Lemma l7T4l we have the following relation 
for the moments of bg, for ^ > 1. 

Eib'^ I C(^) - ^E[^Z{k) I Bi]. (7.73) 
81 

Plugging ( 17.731 ) into ( 17.72b gives 

1 °° ('/v'l*^ 

"^^''^ = ^ ^^'^^ ^ (7-74) 
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Now recall Y = ^-^^ and notice that ] — > °o since K" ^ 0. These facts lead to 

^^s ^^'^ - iTs^^^l + <i ^-^mB.)]) (7.75) 

But since 5{ j) for all j we have, 

An almost identical argument shows that the characteristic function of LJ=i ^j{sj) converges 
to the same limit. We have demonstrated ( |7.58l l. ( |7.59t is demonstrated in an identical way. 
To demonstrate ( 17.601 ) we simply compute the characteristic function of the pair {p\,p2)- The 
arguments are almost identical to those we made in deriving ( 17.581 ) so we do not include them 
here. 

□ 

We are finally ready to state and prove Theorems |5] and |6] Their proofs are very similar 
so we prove only Theorem|5] 

Theorem\5\ Set 

G 

k=\ 

Q 

P2 = l^Xf 



k ■ 

k=l 



Let V = (vi , V2). We need to show 

jlim £[exp[/v • (pupi)]] = E[exp[iv ■ {pupi)]]- (7.78) 

We have actually akeady done most of the work in the proof of Lemma l7.11l The arguments 
in the proof of Lemma |7. 1 II show 



Hm£[exp[/v.(pi,p2)]]=exp[-^^ )^^Z{2k-j)] (7.79) 
A standard computation shows that this is exactly the value of £'[exp[/v • {pi,P2)]]- 

n 
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